Limiting disclosure of sensitive data in sequential releases of databases

نویسندگان

  • Erez Shmueli
  • Tamir Tassa
  • Raz Wasserstein
  • Bracha Shapira
  • Lior Rokach
چکیده

Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. Sequential release is a scenario of data publishing where multiple releases of the same underlying table are published over a period of time. A violation of privacy, in this case, may emerge from any one of the releases, or as a result of joining information from different releases. Similarly to [37], our privacy definitions limit the ability of an adversary who combines information from all releases, to link values of the quasi-identifiers to sensitive values. We extend the framework that was considered in [37] in three ways: We allow a greater number of releases, we consider the more flexible local recoding model of “cell generalization” (as opposed to the global recoding model of “cut generalization” in [37]), and we include the case where records may be added to the underlying table from time to time. Our extension of the framework requires also to modify the manner in which privacy is evaluated. We show that while [37] based their privacy evaluation on the notion of the Match Join between the releases, it is no longer suitable for the extended framework considered here. We define more restrictive types of join between the published releases (the Full Match Join and the Kernel Match Join) that are more suitable for privacy evaluation in this context. We then present a top-down algorithm for anonymizing sequential releases in the cell generalization model, that is based on our modified privacy evaluations. Our theoretical study is followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disclosure Limitation of Sensitive Rules

Data products (macrodata or tabular data and microdat a or raw data records), are designed to inform public or bus iness policy, and research or public information. Securi ng these products against unauthorized acces ses has been a long-term goal of the database security research comm unity and the government statistical agencies. Solutions t o this problem require combining several techniques ...

متن کامل

Introducing an algorithm for use to hide sensitive association rules through perturb technique

Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A Method for Protecting Access Pattern in Outsourced Data

Protecting the information access pattern, which means preventing the disclosure of data and structural details of databases, is very important in working with data, especially in the cases of outsourced databases and databases with Internet access. The protection of the information access pattern indicates that mere data confidentiality is not sufficient and the privacy of queries and accesses...

متن کامل

Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion

Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 191  شماره 

صفحات  -

تاریخ انتشار 2012